Network visualization (in R) with “netplot” and motif counting (in C++) with “barry”

SCI Seminar

George G. Vega Yon, Ph.D.

Division of Epidemiology

University of Utah

2023-04-07

Whoami

  • Research Assistant Professor of Epidemiology.

  • Ph.D. in Biostatistics from USC and M.Sc. in Economics from Caltech.

  • Methodologist working at the intersection between Statistical Computing and Complex Systems Modeling.

You can download the slides from
ggv.cl/slides/sci2023

Network visualization with netplot

netplot In a nutshell

  • What: An R package for network visualization inspired by Gephi.

  • Why: Opinionated way to visualize graphs.1

  • Where: You can get the dev version on GitHub (USCCANA/netplot) or the stable version on CRAN.

netplot In a nutshell (cont.)

Current parameters in nplot (main function)

nplot(
  edgelist,
  layout,
  vertex.size             = 1,
  bg.col                  = "transparent",
  vertex.nsides           = 10,
  vertex.color            = grDevices::hcl.colors(1),
  vertex.size.range       = c(0.01, 0.03),
  vertex.frame.color      = ...,
  vertex.rot              = 0,
  vertex.frame.prop       = 0.2,
  vertex.label            = NULL,
  vertex.label.fontsize   = NULL,
  vertex.label.color      = "black",
  vertex.label.fontfamily = "HersheySans",
  vertex.label.fontface   = "bold",
  vertex.label.show       = 0.3,
  vertex.label.range      = c(5, 15),
  edge.width              = 1,
  edge.width.range        = c(1, 2),
  edge.arrow.size         = NULL,
  edge.color              = ~ego(alpha = 0.25, col = "gray") + alter,
  edge.curvature          = pi/3,
  edge.line.lty           = "solid",
  edge.line.breaks        = 5,
  sample.edges            = 1,
  skip.vertex             = FALSE,
  skip.edges              = FALSE,
  skip.arrows             = skip.edges,
  add                     = FALSE,
  zero.margins            = TRUE,
  ...
)

Main features

  • Visualization engine: The grid system (same used by ggplot2.)

  • Layout algorithms: Default uses igraph’s layout.

  • Vertex sizes: Relative to the drawing area.

Example code+output

The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a testbed for community detection algorithms.

How?

library(netplot)
library(igraph)

library(igraphdata)
data("UKfaculty")

# Vertex colors f(group)
vcols <- V(UKfaculty)$Group 
vcols <- palette.colors(
  n = length(unique(vcols))
)[vcols]

set.seed(323)
# Netplot call
nplot(
  UKfaculty,
  edge.line.breaks = 20,
  vertex.color     = vcols
  )

How? (cont)

Things to notice:

  • Vertex size autoscaled to the device size.

  • Edged colored mixing ego and alter (source+target.)

  • Edges change colors continuously (gradient.)

  • Vertices and edges’ sizes scale as required by the user.

Structure

Graphical objects (Grobs)

List of 11
 $ .xlim        : num [1:2] -1 1
 $ .ylim        : num [1:2] -0.5 0.5
 $ .layout      : num [1:81, 1:2] 0.6661 0.0201 0.7327 0.5399 -0.4903 ...
 $ .edgelist    : num [1:817, 1:2] 57 76 12 43 28 58 7 40 5 48 ...
 $ .N           : int 81
 $ .M           : int 817
 $ name         : chr "graph.3"
 $ gp           : NULL
 $ vp           : NULL
 $ children     :List of 2
  ..$ background:List of 10
  .. ..- attr(*, "class")= chr [1:3] "rect" "grob" "gDesc"
  ..$ graph     :List of 5
  .. ..- attr(*, "class")= chr [1:3] "gTree" "grob" "gDesc"
  ..- attr(*, "class")= chr "gList"
 $ childrenOrder: chr [1:2] "background" "graph"
 - attr(*, "class")= chr [1:4] "netplot" "gTree" "grob" "gDesc"

Example 2: Playing with patterns

netplot supports advanced patterns. The figures feature radial gradients (vertices), lineal gradients, and repeated patterns (background).

Other things to consider

In the case of ggplot2 (and thus, ggraph)

  • Patterns in R’s grid system are not directly available.1

While ggplot2 uses grid underneath it’s grammar API, these features are generally not directly available in ggplot2.
– Thomas Lin Pedersen, author of ggraph (source: tidyverse.org)

  • But the gggrid package does:

The ‘ggplot2’ package does not yet have an interface for pattern fills, but the ‘gggrid’ package (Murrell, 2022) allows us to combine raw ‘grid’ output with the ‘ggplot2’ plot.
– Paul Murrel, author of grid (source: Vectorised Pattern Fills in R Graphics)

Counting motifs with barry

barry in a nutshell

  • What: A C++ header-only template library for motif counting (and more.)

  • Why: Implement Discrete Exponential Family Models [DEFMs] for phylogenetics and social networks analysis.

  • Where: You can get it on GitHub (USCBiostats/barry)

Main features

About 11 K lines of C++ code built for statistical modeling:

  • Motif count using change statistics (we will return to that.)

  • Full and constrained enumeration of 0/1 arrays.

  • Computes probability function for Discrete Exponential-Family Models [DEFMs].

  • Memory and computationally efficient for pooled models.

Features: Change statistics

Let’s look into the change statistics edgecount, triangles, and gender-homophily when we remove tie 33-69.

Features: Change statistics (cont.)

s() y- y+ change
Edgecount 816 817 1
Triangles 5366 5399 33
Group-homophily 664 665 1

Features: Memory and Comp. Efficiency

  • A fundamental feature of pooled models (multiple graphs/arrays).
  • A single model may feature thousands of networks.
  • But if all have the same number of nodes (and other features)… we only need to enumerate once.

Modeling

flowchart LR
  start[Create the\nmodel] --> count[Add\ncounters]
  count --> count_done{Done?}
  count_done --> |Yes| const[Add\nconstraints]
  count_done --> |No| count
  const --> const_done{Done?}
  const_done --> |Yes| init((End))
  const_done --> |No| const
flowchart LR
  start[Hash the\narray] --> exists{Present?}
  exists --> |Yes| ptr[[Link\nPointer]]
  exists --> |No| addarray[[Compute\nsupport]]
  ptr --> End((End))
  addarray --> End

Current implemented models

In principle, any dataset we can represent as 0/1 arrays can be modeled with barry. We have applications in social networks, phylogenetics, and health outcomes.

Current implemented models: DEFMs

  • DEFMs for multiple correlated outcomes (Markov Random Fields; on development with Drs. MJ Pugh and Tom Valente.)

Current implemented models: Social networks

Current implemented models: Phylogenetics

  • Modeling the evolution of gene functions in terms of transition between functional states (research grant submitted to National Human Genome Research Institute NHGRI).

Other projects

fmcmc | ergmito | aphylo | netdiffuseR | ABCoptim
slurmR | barry | rgexf | geese

Other projects (cont.)

Using rgexf (my most popular R package w/ 600K downloads)

Networks co-session network at INSNA’s Sunbelt 2022: Nodes are colored according to their roles: speaker, session chair, and session organizer.–(source: gvegayon/gallery)

Thanks!

gvegayon
@gvegayon
ggvy.cl

Presentation created with quarto::revealjs

A1. Change statistics formals

  • The change statistic is defined as a real-valued vector where the \(k\)-th entry equals the observed change when the \(ij\)-th tie is removed from the network. Formally:

    \[ \delta(y_{ij}: 0\to 1) = s(\mathbf{y})_{ij}^+ - s(\mathbf{y})_{ij}^- \]

    Where \(s(\cdot)\) is a function returning graph \(\mathbf{y}\)’s observed statistics, and \(s(\mathbf{y})_{ij}^+\) represents its value when \(y_{ij} = 1\).

A1. Change statistics formals (cont.)

Furthermore, conditioning on the rest of the network (or array,) a tie transition probability equates a logit:

\[\begin{equation} \mbox{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]

\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \mbox{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]